amdgpu: accept raw-HSACO kernels with a partial implicit kernarg suffix#59
Draft
zjgarvey wants to merge 2 commits into
Draft
amdgpu: accept raw-HSACO kernels with a partial implicit kernarg suffix#59zjgarvey wants to merge 2 commits into
zjgarvey wants to merge 2 commits into
Conversation
Workaround for a regression in the MERGED form of PR#12 (raw HSACO custom kernargs): the new "truncated implicit kernarg suffix" check in iree_hal_amdgpu_executable_raw_hsaco_custom_kernarg_layout() rejects MIOpen's hand-written assembly conv kernels (e.g. miopenSp3AsmConv*), which declare a hidden arg but size kernarg_segment to only the partial suffix they use. The first GPU conv2d then fails hipModuleLoadDataEx; MIOpen unloads a garbage module handle and the binding segfaults in module_release. The check is redundant: the custom-direct dispatch path already reserves max(kernarg_size, explicit, implicit_offset + IMPLICIT_SIZE) and zero-fills the remainder, so a smaller kernarg_size is safe (the squashed form of #12 accepted these kernels). Drop the rejection; keep the overflow guard. Verified: full native-vs-HRX A/B 58/58 (record + compare); repro/conv2d_min.py passes. Local workaround pending an upstream fix in #12 — see hrx-pytorch-smoke-test/repro/ISSUE_conv2d_truncated_implicit_kernarg.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The layout change deliberately accepts hand-written kernels that declare a partial hidden suffix; dispatch reserves max(kernarg_size, implicit offset + implicit size). Update the stale expectation that failed the AMDGPU CI lanes. Verified locally: executable_test 8/8 pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A PR#12 over-strict check rejected raw-HSACO kernels (notably MIOpen hand-written asm conv kernels) whose declared kernarg segment is smaller than the full implicit-args block the loader expects. MIOpen ignores the load failure and unloads the uninitialized handle -> SIGSEGV (conv2d). Relax the validation: the reservation is already
max(kernarg_size, explicit, implicit_offset + IMPLICIT_ARGS_SIZE)and the dispatch path over-reserves + zero-fills, so a partial suffix is safe.Review
Agent review: clean-to-merge. The OOB hypothesis was adversarially disproven — the kernarg reservation is the correct MAX, computed independently of the removed check, so the full implicit-args write is always in-bounds. Rebased cleanly onto current main.
Test
Paired hip-cts:
iree-org/hip-ctsusers/zjgarvey/miopen-asmconv-kernarg-note— a hand-assembled gfx942 partial-kernarg kernel + load/dispatch harness (runs on MI300X), plus a note. The hand-written kernel does not yet reproduce the exact pre-fix MIOpen-unload SIGSEGV (brittle metadata); practical repro staysconv2d_min.py.🤖 Generated with Claude Code